Early Stopping - But When?
نویسنده
چکیده
Validation can be used to detect when overrtting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overrtting (\early stopping"). The exact criterion used for validation-based early stopping, however, is usually chosen in an ad-hoc fashion or training is stopped interactively. This trick describes how to select a stopping criterion in a systematic fashion; it is a trick for either speeding learning procedures or improving generalization , whichever is more important in the particular situation. An empirical investigation on multi-layer perceptrons shows that there exists a tradeoo between training time and generalization: From the given mix of 1296 training runs using diierent 12 problems and 24 diierent network architectures I conclude slower stopping criteria allow for small improvements in generalization (here: about 4% on average), but cost much more training time (here: about factor 4 longer on average). 1 Early stopping is not quite as simple 1.1 Why early stopping? When training a neural network, one is usually interested in obtaining a network with optimal generalization performance. However, all standard neural network architectures such as the fully connected multi-layer perceptron are prone to overrtting 10]: While the network seems to get better and better, i.e., the error on the training set decreases, at some point during training it actually begins to get worse again, i.e., the error on unseen examples increases. The idealized expectation is that during training the generalization error of the network evolves as shown in Figure 1. Typically the generalization error is estimated by a validation error, i.e., the average error on a validation set, a xed set of examples not from the training set. There are basically two ways to ght overrtting: reducing the number of dimensions of the parameter space or reducing the eeective size of each dimension. Techniques for reducing the number of parameters are greedy constructive learning 7], pruning 5, 12, 14], or weight sharing 18]. Techniques for reducing the size of each parameter dimension are regularization, such as weight decay 13] and others 25], or early stopping 17]. See also 8, 20] for an overview and 9] for an experimental comparison. Early stopping is widely used because it is simple to understand and implement and has been reported to be superior to regularization methods in many cases, e.g. in 9].
منابع مشابه
Comparing different stopping criteria for fuzzy decision tree induction through IDFID3
Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...
متن کامل, 2013 Lecture 12 : January 5
In the previous lecture we saw the algorithm that builds a decision tree based on a sample. The decision tree is build until zero training error. As we saw before, our goal is to minimize the testing error and not the training error. In order to minimize the testing error, we have two basic options. The first option is to decide to do early stopping. Namely, stop building the decision tree at s...
متن کاملAppeared in Neural Networks 1998 Automatic Early Stopping Using Cross Validation: Quantifying the Criteria
Cross validation can be used to detect when over tting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overtting (\early stopping"). The exact criterion used for cross validation based early stopping, however, is chosen in an ad-hoc fashion by most researchers or training is stopped interactively. To aid a more well-founded selecti...
متن کاملAutomatic early stopping using cross validation: quantifying the criteria
Cross validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting ('early stopping'). The exact criterion used for cross validation based early stopping, however, is chosen in an ad-hoc fashion by most researchers or training is stopped interactively. To aid a more well-founded sele...
متن کاملUsing Early Stopping to Reduce Overfitting in Wrapper-Based Feature Weighting
It is acknowledged that overfitting can occur in feature selection using the wrapper method when there is a limited amount of training data available. It has also been shown that the severity of overfitting is related to the intensity of the search algorithm used during this process. We demonstrate that the problem of overfitting in feature weighting can be exacerbated if the feature weighting ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996